-
Notifications
You must be signed in to change notification settings - Fork 24
Upgrade client-python to new protocol for grakn 1.7 #88
Upgrade client-python to new protocol for grakn 1.7 #88
Conversation
| response = self._communicator.send(request) | ||
| # convert `response` into a python iterator | ||
| return ResponseReader.ResponseReader.query(self, response.query_iter) | ||
| return imap(ResponseReader.ResponseReader.query(self), Iterator(self._communicator, RequestBuilder.start_iterating_query(query, infer))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you write a comment on what this imap does here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mapping the response reader to each result of the iterator, is the comment definitely necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just hard to read what is intended to be going on here: are we mappingthe query() function with new arguments generated by the iterator? Does that mean the query(self) method returns a function that takes further arguments?
Functional style programming is great but i find it hard to follow when mixed in with other styles :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the imap to ResponseReader to be consistent with the style for get_attributes_by_value. This method previously existed and was called query but returned an iterator that it created, since iteration was handled by ResponseReader prior. I have also renamed the method to get_query_results now, and it takes an iterator as as parameter rather than being mapped (so that the imap and the lambda are in the same place).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think if you rename query(self) above like i mentioned in other comment it will help a lot, right now this just isn't readable standalone :)
| self._communicator = communicator | ||
| self._iter_req = iter_req | ||
| self._buffer = [] | ||
| self._start_iterating() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended that we start iterating right away? We aren't lazy anymore then right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't immediately send the query then define and insert queries, which are not expected to have results, will not get executed (since currently the iterators are not executed).
We can choose to use a batch size of 0 for the first iteration, but that would introduce an extra round trip to gets where we are actually expecting a result.
Also, in general, being lazy is undesirable on the network, since latency is nearly always part of the critical path of the operation (or at least, should be after optimisation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah i forgot about define and insert queries, this is important.
| def _create_answer_group(tx_service, grpc_answer_group): | ||
| grpc_owner_concept = grpc_answer_group.owner | ||
| owner_concept = ConceptFactory.create_concept(tx_service, grpc_owner_concept) | ||
| owner_concept = ConceptFactory.create_remote_concept(tx_service, grpc_owner_concept) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we require remote concepts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but I haven't done any analysis on what might be necessary to pre-fetch here, so this is how we currently do it in Java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can you try using a local concept and see if it breaks any tests? It shouldn't need any remote functionality...
|
Might want to rename the PR title properly and elaborate on the description, though, @adammitchelldev . They will be used for the release notes. |
|
To be clear, this isn't doing the asynchronous refiling of a shared buffer right? Ie. the channel will be blocked after calling the batch iterator until the batch is received? |
|
Also I think you could improve description of the goal to include "Reduce round trips and read query latency" or similar, so people see it in the release notes :) |
| iterator_id, | ||
| lambda tx_serv, iterate_res: AnswerConverter.convert(tx_serv, iterate_res.query_iter_res.answer)) | ||
|
|
||
| def query(tx_service): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so i think this method's naming is the incorrect - this is really the ReasonerReader.from_iterate_response or something right? query is too generic and doesn't carry any meaning anymore
What is the goal of this PR?
Update client-python to the new 1.0.5 protocol with changed iteration style and local/remote concept API split.
The new protocol significantly reduces the number of round trips necessary to complete queries, down to 1 in most cases, from the previous solution which required some multiple of the number of results. This should alleviate slowdown problems caused by connecting to a remote Grakn instance. Iteration now follows a streaming principle and results are pre-filled with data the user would normally be expected to retrieve via the concept API.
The step to split local and remote concepts helps us move towards "value" results rather than results which need to embed the transaction they were retrieved with. The added requirement to convert these concepts back to remote concepts should also make it clear to the user which methods require additional RPCs (where it wasn't clear beforehand that ALL methods required additional RPCs).
What are the changes implemented in this PR?